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Description 

Field of the Invention 

5 The invention relates to a method and apparatus for recognizing characters printed on a document 
which is skewed relative to an image scanner. 

Prior art and problems 

w The optical character recognition (OCR) technology has been used to recognize character images of a 
document. The OCR technology including steps of scanning the image of the document, comparing analog 
signals generated , by the scanning operation with a threshold value to generate binary signals representing 
the image of the document, storing the binary signals in an image buffer, segmenting the character images, 
that is, breaking the image of the document into separate, distinct images of each character, recognizing 

75 the segmented character images, and outputting the results of the recognition. 

The segmentation includes a step for separating character rows from each other. Referring to Fig. 1B, 
the image of the document 201 stored in the image buffer 202 is shown. The Fig. 1B shows that the 
document 201 has been scanned by a scanner without any skew of the document 201 with respect to the 
scanner. To separate the character rows from each other, a shadow projection technology has been used. 

20 The shadows of all characters are projected to generate projections 203 and 204. The projections 203 and 
204 represent the positions of the first and second character rows in the image buffer in the Y axis. A 
problem arises when the document is skewed or inclined with respect to the scanner, as shown in Fig. 1C. 
The skewed documents 205 generates a long projection 207 into which the first and second character rows 
are included. In this case, the character images in the first character row are not separated from the 

25 character images in the second character row, whereby two character rows are treated as a single row, so 
that the characters in the both rows are mixed each other, and printed out in a single character row of an 
output print out indicating the results of the OCR of the document 205. To assure the separation of the 
character rows, the maximum skew angle for a standard A4 size document is about 1 degree. To perform 
the separation of the character rows for more skewed documents, the Japanese patent application 56- 

30 204636 indicates a solution in which the character rows are separated into plural blocks 209 as shown by 
vertical dotted lines 206 in the Fig. 1C, and a block projection, e.g. 208, is generated for each block, and 
the segmentation of the characters of a block 209 is made based upon the block projection 208. A 
continuity of one block 209 to the next block 210 is recognized to recognize the characters of one character 
row. Although the patent application 56-204636 somewhat improves the problem, it requires a complicated 

35 process for finding out the continuity of the blocks. An inherent problem included in the technology using 
the projections is that the technology does not successfully operate when the characters and a photograph 
are mixed in the horizontal direction of the document. 

R. L. Hoffman and J. W. McCullough, Segmentation Methods for Recognition of Machine-Printed 
Characters, IBM Journal of Research and Development, vol. 15 (1971), 153-165, describes an algorithm for 

40 separation of touching characters. Scanned characters are examined of their vertical densities (i.e., number 
of black pixels in each vertical line), and low density lines will be selected as the boundaries of characters. 
Hence the method in the article apparently differs from that of the present invention. 

K. Y. Wong, R. G. Casey, and F. M. Wahl, Document Analysis System, IBM Journal of Research and 
Development, vol. 26 (1982), 647-656, describes a genera! concept of an office system for document 

45 analysis. There is a description of a segmentation of characters using the projection method, as described 
hereinabove. The concept of the article apparently differs from that of the present invention. 

R. G. Casey and G. Nagy, Recursive Segmentation and Classification of Composite Character Patterns, 
Proceedings of 6th International Conference on Pattern Recognition (1982), describes on a segmentation 
method in the case several characters are connected. As the first step of segmentation, a method presented 

so in the Wong's article is used. The algorithm which is described will be used if the segmented block is 
supposed to be a connected characters. The concept of the article apparently differs from that of the 
present invention. ' 

R. G. Casey and C. R. Jih, A Processor- Based OCR System, IBM Journal of Research and Develop- 
ment, vol. 27 (1983), 386-399, describes a general method for OCR systems. The algorithm is that 
55 characters are segmented after the baseline detection. Also the Decision Tree Algorithm is described here. 
The article does not disclose the concept of the present invention. 

R. G. Casey, S. K. Chai, and K. Y. Wong, Unsupervised Construction of Decision Networks for Pattern 
Classification, Proceedings of IEEE 7th International Conference on Pattern Recognition (1984), describes a 
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recognition algorithm, and there is no description on the segmentation. 
Summary of the invention 

5 It is the object of the present invention to provide a method and apparatus for recognizing the 
characters and symbols of the document which is skewed or inclined with respect to an image scanner. 
This object is achieved with the method as defined in claim 1 and the apparatus as defined in claim 6. 

A document is scanned by an image scanner and image data representing the image of the document 
is stored in an image storage means. Rectangles contacting and surrounding outer boundaries of each 

to image of plural character rows in the image storage means are generated, and the positions of four edges 
of the rectangle in XY coordinates of the image storage means are detected. A size of a rectangle is 
calculated based upon the detected positions of the edges, and the size of each rectangle is compared with 
a expected size range for the characters and symbols to be recognized. The positions of the rectangles 
falling into the size range are stored in a first table as the position data wherein the position data of the 

75 rectangles of the characters and symbols over the plural character rows are arranged in the first list in the 
order from a rectangle at one end to a rectangle at the other end along the direction of the X axis of the XY 
coordinates. 

The position data of the rectangles in the first list are sequentially fetched in the arranged order to 
detect a size of each rectangles to determine an average size of all rectangles stored in the first list. Again, 

20 the position data of the rectangles in the first list are sequentially fetched in the arranged order to find out a 
first rectangle falling into a size range settled based upon the average size. The fetch operations are 
continued to find out a second rectangle having a bottom left corner located within predetermined distances 
in the X and Y directions from a bottom left corner of the first rectangle. The fetch operations are continued 
to find out a third rectangle having a bottom left corner located within the predetermined distances in the X 

25 and Y directions from the bottom left corner of the second rectangle. The operations continues to find out a 
predetermined number of rectangles in one character row satisfying the condition. When the predetermined 
number of the rectangles have been found, a skew of the character row in the XY coordinates is calculated 
based upon the positions of the bottom left corners of these rectangles, and this detected skew is treated 
as the skew of the document. 

30 Again, the position data of the rectangles in the first list are sequentially fetched in the arranged order, 
and the position of the bottom edge of each rectangle in the Y axis is corrected by the above skew of the 
document. The corrected position is called as a virtual position of the rectangle hereinafter. Among the 
virtual positions calculated during the fetch operations, one virtual position located at the highest position on 
the document is detected and the detected highest virtual position is stored in a register. 

35 Again, the position data of the rectangles in the first list are sequentially fetched in the arranged order, 
and the virtual position of each rectangle is calculated again. And, a comparison is made as to whether the 
virtual position of each rectangle falls into a predetermined range from the highest position stored in the 
register. This range is selected to catch the rectangle of characters, such as the small characters "p", "y". 
having a leg extended below the base line of the character. During the fetch and compare operations, the 

40 position data of the rectangle which falls into the range is transferred from the first list to a second list, and 
at the end of the operations, the position data of the rectangle in the first character row have been stored in 
the second list in the order from the rectangle of the first character to the last rectangle of the last 
character. 

An recognition unit sequentially fetches the position data of the rectangles stored in the second list, 
45 reads the image data in the image storage means surrounded by the rectangle specified by the position 
data, and recognizes the image data. 

In order that the invention may be fully understood a preferred embodiment will now be described with 
reference to the accompanying drawings 

Rg. 1A shows the image of the skewed document stored in the image buffer 23 which is recognized in 
so accordance with the present invention. 

Figs. 1B and 1C show a prior technology for segmenting character images on the document. 
Fig. 2 shows a block diagram of circuit configurations for performing the method of the invention. 
Fig. 3 shows a flowchart of the method of the invention. 

Fig. 4 shows the detection of the positions of four edges of the rectangle which contacts the outer 
55 boundaries of the character A. 

Figs. 5A, 5B and 5C show the detection of the rectangle, the bottom left corner of which is located within 
a predetermined distances in the X and Y axes from the bottom left corner of the preceding rectangle. 
Fig. 6 shows the generation of the virtual positions of the rectangles. 
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1 Document 

20 Control unit 

21 Scanner 

22 Threshold circuit 

5 23 Image buffer 

24 Character position detect device 

25 Table memory 

26 Skew calculation device 

27 One row select device 

w 31 Recognition unit 

32 Output buffer 

Description of embodiment 

T5 Referring to Fig. 1A, a document 1 including plural character rows is shown. For simplifying the 
drawing, only three character rows are shown. Further, the spaces between the characters in both the 
horizontal and vertical directions on the document 1 are more expanded than that of the actual document 
for understanding the present invention. Each character row, therefore, could include more characters than 
that shown in the Fig. 1 A. 

20 The document 1 is shown as being inclined or skewed by a skew angle in the XY coordinates. 

The document 1 is scanned by a document scanner 21 shown in Fig 2. The document scanner 21 
provided with a light source, an optical sensor array and means for relatively moving the document 1 to the 
optical sensor array. The optical sensor array includes plural optical sensor elements, such as Charge 
Coupled Devices, arranged in a density or resolution of 240 pels/inch. For example, 2016 optical sensor 

25 elements arranged in one row in a horizontal direction are required to scan A4 size document with a width 
of 210 mm. Each element defines one picture element (pel). The light from the light source is reflected by 
the document 1 , and the reflected light representing an image of the document 1 is detected by the optical 
sensor array, which generates electrical analog signals of the pels. The analog signals are supplied to a 
threshold circuit 22 shown in the Fig. 2 which compares each of the analog signals with a threshold level. If 

30 the analog signal exceeds the threshold level, the threshold circuit 22 generates binary "0 n signal 
representing a white level. If the analog signal does not exceed the threshold level, the threshold circuit 22 
generates binary "1" signal representing a black level. The optical sensor elements arranged in one line in 
the horizontal direction define a scan line 2 shown in the Fig 1A. As the document 1 is relatively moved to 
the optical sensor elements, the scan line 2 moves downwardly in the Y direction on the document 1 , and 

35 the image data of the document 1 is gradually stored in an image buffer 23. 

A control unit 20 is shown in the Fig. 2. The control unit 20 controls the operations of all blocks in the 
Fig. 2. For simplifying the drawings, however, the connections between the control unit 20 and the blocks 
are not shown in the Fig. 2. 

The processing operations in accordance with the present invention are generally classified into the 

40 followings: 

(A) Scan and store of the image of the document 

(B) Segmentation of character images and detection of the position thereof to form a first list in a table 
memory 25 

(C) Detection of skew angle of the document 

45 (D) Reorder of the contents of the first list in the table memory 25 and assemble of a second list in a 
table memory 30 

(E) Recognition of the character image specified by the contents of the second list 
The followings are detail descriptions of the (A) through (E). 

50 (A) Scan and store of the image of the document 

It is assumed that the image buffer 23 has a small storage capacity 23A for storing a portion of the 
skewed document 1 including three character rows. 

The control unit 20 responds to an operator's depress of a start switch of the scanner 21 or the 
55 termination of the processes of the step (E) described hereinafter to start the store operation of the image 
buffer 23. Since the portion shown in the Fig. 1A is the starting part of the documents 1, the depress of the 
start switch causes the control unit 20 to start the operations. The control unit 20 controls the scanner 21, 
the threshold circuit 22 and the image buffer 23 to start the scan operations, to supply the electrical signals 
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from the scanner 21 to the threshold circuit 22, and to store the binary signals representing the image data 
from the threshold circuit 22 into the image buffer 23. The control unit 20 monitors the scan and store 
operations, stops them and start the next operations (B) when the image buffer 23 is filled with the image 
data. The operations are shown as blocks 301 and 302 in the Fig. 3. 

5 

(B) Segme ntation of character images and detection of the positions thereof to form a first list in a table 25 

The control unit 20 activates a character position detect device 24. It accesses the image buffer 23 to 
sequentially fetches the data of the horizontal bit rows in the order from top to bottom. The purposes of the 

70 operations are (i) to segment each character image, i.e. to break the scanned image of the document into 
separate, distinct images of each character, by generating rectangles each of which surround each 
character image, (ii) to detect the positions of the rectangles in the XY coordinate of the image buffer 23 
and (iii) to store the data representing the positions of the rectangles in a table 25 (Fig. 2) to form a first list. 
The operations (B) are shown as a block 303 in the Fig. 3. The detail descriptions of the above (i), (ii) and 

75 (iii) are as follows. 

(i) The horizontal data bit lines of the image buffer 23 are sequentially fetched in the order from the top 
to the bottom by the character position detect device 24. The character position detect device 24 
determines the presence of the bits 1, i.e. black pels, in each bit line, generates a rectangle which 
contacts outer edge of the pattern of the black pels, and calculates the position of the rectangle in the XY 

20 coordinate. It is noted that the document 1 may include smear black blocks, long lines, a photograph, 
etc. which are smaller or larger than expected sizes of the characters and symbols to be recognized by 
the character recognition unit 31. The character position detect means 24 detects these objects and 
ignores them. 

Describing in more detail with referring to the Figs. 1 A and 4, the scan line 2 corresponds to the data 
25 bit line of image buffer 23. When the bit line 2A is supplied to the character position detect means 24, it 
detects the black pel of the top of the character A. As the subsequent bit line group 41 are supplied to 
the character position detect device 24, it generates a rectangle 42. The character position detect device 
24 determines the continuity of the black pels or the image in the supplied bit lines, and grows up the 
rectangle if it detects the continuity. Referring to the Fig. 4, the rectangle is gradually grown up as shown 
30 by 43 and 44, due to the presence of the continuity of the black pels in the bit line groups 45 and 46. 
The bit line 20 is the final bit line of the bit line group 46. 

The character position detect device 24 detects the lack of continuity of the black pel in the Y 
direction by determining the bit line 2C+1, i.e. next bit line to the bit line 2C. In the same manner, the 
character position detect device 24 detects the lack of continuity in the X direction. 
35 Then, the character position detect device 24 detects the termination of the black pels in the X 

direction at the bit line 2B and the termination of the black pels in the Y direction at the bit line 2C, and 
terminates or fixes the growth of the rectangle, whereby the rectangle 44 contacting the outer edges of 
the continuous black pel group of the character A is generated. 

(ii) When the character position detect device 24 completes the rectangle 44, it calculates the following 
40 positions of the rectangle 44 surrounding the character "A" in the XY coordinates, as shown in the Fig. 4. 

YTA ... Position of the Top edge of the rectangle of the character A in the Y axis in the image buffer 23 
YBA ... Position of the Bottom edge of the rectangle of the character A in the Y axis in the image buffer 
23 

XLA ... Position of the Left edge of the rectangle of the character A in the X axis in the image buffer 23 
45 XRA ... Position of the Right edge of the rectangle of the character A in the X axis in the image buffer 23 
Wherein the first character represents X or Y axis, the second character represents Top, Bottom, Left or 
Right of the rectangle and the third character represents the character surrounded by the completed 
rectangle. 

It is noted that the character position detect device 24 does not perform a recognition as to whether 
so the continuous black pel group represents the character A. The character position detect device 24 
merely detects the rectangles contacting each of the continuous black pel groups and its position and 
size in the XY coordinates. 

Next, the character position detect device 24 determines as to whether the sizes in the X and Y 
directions of the rectangle falls into a range settled for the expected sizes of the characters and symbols 
55 to be recognized. As described hereinbefore, the purpose of the determination of the sizes of the 
completed rectangle is to ignore the black block, the long line, the photograph sizes of which are out of 
the expected sizes of the characters and symbols. When the character position detect device 24 finds 
out the rectangle of the sizes out of the expected sizes, the character position detect device 24 ignores 
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the rectangle without storing the position data of such rectangle in the next step (iii). 

(iii) The character position detect device 24 stores the position data VTA, YBA, XLA and XRA into an 

entry, e.g., address 2, of a table memory 25 to form the first list, as shown in a table 1. 

5 

TABLE 1; FIRST LIST 



75 



ADDRESS 


TOP 


BOTTOM 


LEFT 


RIGHT 


POINTER 




EDGE 


EDGE 


EDGE 


EDGE 




1 










2 


2 


YTA 


YBA 


XLA 


XRA 


0 



The entry address 1 is the initial entry when the table access is started. The device 24 stores the 
20 address 2 as the pointer of the entry 1 which indicates that the next entry to be accessed is the entry 2. 

The value 0 of the pointer of the entry 2 represents that the entry 2 is the last entry, so that the table 

access operations are terminated. 

In the same manner, the character position detect device 24 completes a rectangle 45 surrounding the 

character V, as shown in the Fig. 1A, and calculates the position data YTr, YBr, XLr and XRr. The bottom 
25 edge of the rectangle 45 lies on a bit line 2D. The character position detect device 24 stores the data YTr, 

YBr, XLr and XRr into an entry address 3, as shown below, compares the XLr with the XLA to determine 

which of the rectangles is close to the value X = 0. In this case, since the rectangle of character A is located 

at the left side of rectangle of the character r, the character position detect device 24 stores the address 3 

as a pointer in the entry 2. 

30 

TABLE 2; FIRST LIST 



35 



40 



45 



ADDRESS 


TOP 
EDGE 


BOTTOM 
EDGE 


LEFT 
EDGE 


RIGHT 
EDGE 


POINTER 


1 










2 


2 


YTA 


YBA 


XLA 


XRA 


3 


3 


XTr 


YBr 


XLr 


XRr 


0 



The contents of the first list in the table memory 25 when the characters A, r, B and C have been 
processed are as follows: 

50 



55 
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TABLE 3: FIRST LIST 



5 


ADDRESS 


X vj.tr 

EDGE 


Dwl L\jn 
EDGE 


T EFT 

EDGE 


EDGE 


POINTER 




1 










5 


10 


2 


YTA 


YBA 


XLA 


XRA 


3 


75 


3 


YTr 


YBr 


XLr 


XRr 


0 




4 


YTB 


YBB 


XLB 


XRB 


2 


20 


5 


YTC 


YBC 


XLC 


XRC 


4 



It is noted that as the process proceeds, the pointers of the entries have been changed to access the 
25 position data in the sequence of C, B, A and r, i.e. in the direction from left to right on the document 1. That 
is, the table entry 1 stores the pointer 5, which addresses the entry 5 storing the data of the rectangle of the 
character C and the pointer 4, which addresses the entry 4 storing the position data of the rectangle of the 
character B and the pointer 2, which addresses the entry 2 storing the position data of the rectangle of the 
character A and the pointer 3, which addresses the entry 3 of the rectangle of the character r. 
30 It is apparent that whenever the character position detect device 24 completes a new rectangle, the 
device 24 compares the position in the X axis of this new rectangle to the X axis positions of the old 
rectangles already stored in the first list, and modifies the pointers of the new and old rectangles to cause 
the access operations for them to be made in the order from the left end rectangle to the right end 
rectangle in the direction of the X axis. The contents of the first list in the table memory 25 when ail 
35 rectangles for the character and symbol images have been processed are shown in the following table 4. 
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45 



50 



55 
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TABLE 4; FIRST LIST 



5 


ADDRESS 


ruir 
EDGE 


xiu i run 
EDGE 


T tPT 
Jut>C 1 

EDGE 


EDGE 


POINTER 




1 










7 


10 
















2 


YTA 


YBA 


XLA 


XRA 


9 


75 


3 


YTr 


YBr 


XLr 


XRr 






4 


YTB 


YBB 


XLB 


XRB 


10 


20 


5 


YTC 


YBC 


XLC 


XRC 


12 




6 


YTq 


YBq 


XLq 


XRq 


11 


25 
















7 


YTD 


YBD 


XLD 


XRD 


13 


30 


8 


YTP 


YBP 


XLP 


XRP 


14 




9 


YT, 


YB, 


XL, 


XR, 


15 


35 


10 


YTn 


YBn 


XLn 


XRn 


16 




11 


YTZ 


YBZ 


XLz 


XRz 


3 


40 
















12 


YTm 


YBm 


XLm 


XRm 


17 


45 


13 


YT1 


YB1 


XL1 


XR1 


18 




14 


YTy 


YBy 


XLy 


XRy 


6 


50 


15 


YTx 


YBx 


XLx 


XRx 


8 
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16 YTw YBw XLW XRw 2 



5 

17 YTu YBu XLu XRu 4 



18 YTt YBt XLt XRt 



70 



The pointers in the first list in the table memory 25 indicates that the initial access to the first list is 
made at the entry address 1 and the remaining entries are accessed in the sequence shown in the lower 
75 part of the Fig. 1A. That is, in the first list shown in the table 4. plural position data of the rectangle are 
arranged in the order from the left most rectangle to the right most rectangle in the direction of the X axis of 
the XY coordinates of the image buffer 23. The control unit 20 detects the termination of the operations (B) 
and starts the next operations (C). 

20 (C) Detection of skew of the document 

In the operation, the rectangles which belong to any one of the character rows of the document 1 are 
picked up and the skew of the document 1 in the XY coordinates is detected. 

25 Detail operations are as follows: 

(i) The control unit 20 activates a skew calculation device 26 shown in the Fig. 2. The skew calculation 
device 26 accesses all entries of the first list (TABLE 4) of the table memory 25 in the order specified by 
the pointers to fetch the position data, i.e. the top, bottom, left and right edge values of each rectangle, 

30 and calculates the size of each rectangle in both the X and Y directions. And, the skew calculation device 
26 calculates an average size of all rectangles. The operations are represented as a block 304 in the Fig. 
3. 

(ii) The skew calculation device 26 again accesses all entries of the first list (TABLE 4) in the order 
specified by the pointers to find out a first rectangle falling into an allowed range from the average size. 

35 When the skew calculation unit 26 finds out the first rectangle, it stores in a register, not shown, the data 
of the bottom and left, i.e. the bottom left corner of the first rectangle. 
In the exemplary case, the first rectangle is the rectangle of the character D. And, the skew calculation 
device 26 continues the access of the first list to find out a second rectangle, which is located right side of 
the first rectangle, which falls into said allowed range and satisfies the following conditions (1) and (2): 

40 

X f + Xi <Xs< X f + Xz (1) 
Yf - Yi < Y 8 < Y f + Y 2 (2) 

These values represent the positions and distances shown in Fig. 5A. 
45 X|, Y f Position of the bottom-left corner of the first rectangle 51 

Xs, Y s Position of the bottom-left corner of the second rectangle 52 
Xi Predetermined distance from X, 

Xz Predetermined distance from X f 

Yi Predetermined distance from Y f , and 

so Y 2 Predetermined distance from Y f 

The value Xi is experimentally selected to accommodate the case that first rectangle is the narrowest 
one for the narrow character, such as "1". The value X2 is also experimentally selected to accommodate 
the case that the second rectangle is double-spaced from the first rectangle. The values Yi and Y 2 are also 
experimentally selected to accommodate the case that the maximum skew angle of the document 1 is 5 • . 
55 The other consideration made to select the value Y 2 is to reject the character, such as "p". in the Fig. 5C, 
having a long extension below its base line, since the skew of the document 1 is determined based upon 
the positions of the bottom-left corners of the rectangles, as stated hereinbelow. 
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The values Xi, X2, Y1 and Y 2 define an area 53, as shown in the Figs. 5A, 5B and 5C. The conditions 
(1) and (2) determine if the bottom-left corner PQ, Y s ) of the second rectangle 52 falls into the area 53. The 
second rectangle 52 in the Fig. 5A satisfies the conditions (1) and (2). The second rectangle 55 in Fig. 5B 
does not satisfy the conditions (1) and (2). The second rectangle 57 in Fig. 5C satisfies the condition (1), 

5 but does not satisfy the condition (2). 

When the skew calculation device 26 finds out the second rectangle, in the case of the Fig. 1A, the 
rectangle of the character C, which satisfies the conditions (1) and (2), it stores in the register the position 
data of the bottom and the left of the second rectangle. It is noted that the register of the skew calculation 
device 26 now stores the position data of the bottom-left corners of the first and second rectangles. And, 

to the skew calculation device 26 continues the access of the first list (TABLE 4) to find out the third rectangle 
which has a bottom-left corner falling into the area 53 of the second rectangle. When it finds out the third 
rectangle, in this case the rectangle of the character B, it stores the position data of the bottom and the left 
of the third rectangle in the register. In this manner, the skew calculation device 26 finds out a series of the 
rectangles which have the bottom-left corner fallen into the area 53 of the preceding rectangle, and 

75 determines whether the number of the found rectangles is equal to a predetermined number, such as 15. 
The number was selected under the assumption that a standard English letter of a standard type format 
includes at least 15 average size characters in one character row. Any other number could be used. 

At the termination of the first search of the first list started by specifying the rectangle of the character 
D as the first rectangle, the skew calculation device 26 has found out the four rectangles of the characters 

20 D, C, B and A. Since this number is smaller than 15, the skew calculation device 26 resets the data stored 
in the register, and starts the second search of the first list by specifying the rectangle of the character C as 
the first rectangle. It is apparent that the searches repeated three times by specifying as the first rectangle 
the rectangles of the characters C, B and A, respectively do not find out 15 rectangles each of which has 
the bottom-left corner falling into the area 53, i.e. the conditions (1) and (2), of the preceding rectangle. The 

25 skew calculation device 26 starts the fifth search by specifying the rectangle of the character 1 as the first 
rectangle. It is assumed that the second character row beginning with the character 1 of the document 1 in 
the Fig. 1A includes 15 average size characters, though only 7 characters are shown in the Fig. 1A. At the 
termination of the fifth search, the skew calculation device 26 knows that the 15 rectangles have been 
found. And, the position data representing the bottom-left corners of the found 15 rectangles have been 

30 stored in the register. The skew calculation device 26 fetches the position data in the register, and 
generates a skew angle of the second character row by using a method of least square, which has been 
well known in the art. And, the skew calculation device 26 keeps the skew angle for use of it as the skew 
angle of the document 1 in the latter process. 

The operations are shown as a block 305 in the Fig. 3. The control unit detects the termination of the 

35 operations (C) and starts the next operations (D). 

(D) Reorder of the contents of the first list in the table memory 25 and assemble of a second list in a table 
memory 30 

40 The purpose of the reorder of the contents in the first list is to find out the rectangles of the characters 
and symbols belonging to one character row among all rectangles stored in the first list, to fetch the position 
data of the characters of these rectangles, and to store them in a table memory 30 to form a second list. 
The following detail description relates to the above operations for storing the position data of the rectangles 
of the characters D, C, B, A belonging to the first character row of the document 1 shown in the Fig. 1A, 

45 into the table memory 30 to form the second list. 

Detail operations are as follows. 

(i) The control unit 20 activates one row select device 27 shown in the Fig. 2. The one row select device 
50 27 receives the value of the skew angle of the document 1 from the skew calculation device 26 and 
stores it in a register 28. The one row select device 27 accesses the first list (TABLE 4) in the table 
memory 25 to sequentially fetch the bottom and left values of each rectangle in the order specified by 
the pointers, i.e. in the order of the characters shown in the lower part of the Fig. 1A. The first values 
YBD and XLD fetched from the first list represent the position of the bottom-left corner of the rectangle of 
55 the character D in the XY coordinates. The one row select device 27 performs the following calculations: 

YB = (XLD x tan 6) + YBD (3) 
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The value YB represents a virtual position of the bottom-left corner of the rectangle in the Y axis 
when the document 1 is rotated in a clockwise direction to correct the skew angle $, as shown in the Fig. 
6. The virtual position YB is stored in a register 29 of the one row select device 27. Next, the one row 
select device 27 fetches the position data YBI and XLI of the rectangle of the character 1 and generates 

5 the virtual value YB for the bottom left corner of the rectangle, and compares the YB for the rectangle of 
the character 1 with the value YB for the rectangle of the character D to determine which of the both 
values is smaller. In this case, the YB for the rectangle of the character D is selected, and the content of 
the register 29, i.e. the value YB for the rectangle of the character D is not changed. In this manner, the 
one row select device 27 continuously accesses ail position data in the first list (TABLE 4), and 

70 generates the virtual value YB of each rectangle, and compares the new value YB with the old value YB 
stored in the register 29, and replaces the old value YB by the new value YB if the new value is smaller 
than the old value YB, so that the smallest value YB among all rectangles is stored in the register 29. 
The smallest value YB indicates the value YB of the rectangle in the first character row, which is located 
at the highest position on the document, as shown in the Fig. 6. The operations are shown as a block 

75 306 in the Fig. 3. 

(ii) The one row select device 27 again accesses the first list (TABLE 4) in the table memory 25 in the 
order specified by the pointer, i.e. in the order shown in the lower part of the Fig. 1 A. The one row select 
device 27 fetches the position data YBD and the XLD of rectangle of the character D, and performs the 
calculation (3) as described hereinabove. And, the one row select device 27 determines whether the 

20 calculated value YB falls into a range 61 shown in Fig. 6, or not. The lower limit of the range 61 is 
selected to catch the rectangles of the characters having the long leg below the base line, such as the 
characters p and y. In this case, the answer is YES, then the one row select device 27 decides as that 
the character D belongs to the first character row, and fetches the four position data of the rectangle of 
the character D, i.e. YTD, YBD, XLD and XRD of the entry 7 of the first list (TABLE 4) in the table 

25 memory 25 and stores them in an entry 1 of the second list of the table memory 30. Also, the one row 
select device 27 replaces the pointer 7 in the entry 1 of the first list by the pointer 13 in the entry 7 of 
the rectangle of the character D, and deletes the contents in the entry 7. 

The modification of the pointer is performed for skipping the delated entry 7 and pointing the entry 13 
of the next character 1. 

30 Next, the one row select device 27 fetches the four position data of the rectangle of the character 1 in 
the entry 13 of the first list and repeats the calculation and the comparison. The comparison indicates that 
the calculated value YB of the rectangle of the character 1 does not fall into the range 61 settled for the first 
character row, that is, the character 1 does not belong to the first character row. The one row select device 
27 does nothing anymore on the rectangle of the character 1, and accesses the entry 18 specified by the 

35 pointer 18 in the entry 13, and repeats the above calculation and comparison. The comparison again 
indicates that the calculated value YB of the rectangle of the character "t" does not fall into the range 61 , 
hence the one row select device 27 terminates the process of the rectangle of the character "t" and 
accesses the entry 5 which is specified by the pointer 5 in the entry 18. The entry 5 stores the four position 
data of the rectangle of the character C in the first character row. The comparison indicates that the 

40 calculated value YB of the rectangle of the character C fails into the range 61, and the one row select 
device 27 decides that the character C belongs to the first character row. The one row select device 27 
fetches the four position data of the rectangle of the character C, i.e. YTC. YBC, XLC and XRC, of the entry 
5 of the first list and stores them in an entry 2 of the second list of the table memory 30. Also, the one row 
select device 27 replaces the pointer 5 in the entry 18 of the first list by the pointer 12 in the entry 5. and 

45 deletes the contents of the entry 5, whereby the entry 5 is skipped in the latter operations and the rectangle 
of the character "t" is followed by the rectangle of the character "m". In this manner, the one row select 
device 27 repeats the above operations for each rectangle in the first list. The contents of the modified first 
list and the new second list at the completion of the operations for all entries of the first list are shown in the 
Tables 5 and 6, respectively. The above assembly of the second list and the modification of the first list are 

so shown as a block 307 in the Rg. 3. 



55 
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The control unit 20 terminates the operations (D) and starts the next operations (E). 

TABLE 5; MODIFIED FIRST LIST 

5 



10 



25 



30 



45 



50 



ADDRESS 


TOP 


BOTTOM 


LEFT 


RIGHT 


POINTER 




EDGE 


EDGE 


EDGE 


EDGE 




1 










13 


2 


3 


YTr 


YBr 


XLr 


XRr 


0 


4 


5 


6 


YTq 


YBq 


XLq 


XRq 


11 


7 


8 


YTP 


YBP 


XLP 


XRP 


14 


9 


YT , 


YB, 


XL, 


XR f 


15 


10 


YTn 


YBn 


XLn 


XRn 


16 


11 


YTz 


YBZ 


XLz 


XRZ 


3 


12 


YTm 


YBm 


XLm 


XRzn 


17 


13 


YT1 


YB1 


XL1 


XRl 


18 


14 


YTy 


YBy 


XLy 


XRy 


6 


15 


YTx 


YBX 


XLX 


XRx 


8 
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16 YTw YBw XLw XRw 9 



5 



70 



75 



20 



25 



17 


YTU 


YBu XLu 


XRU 


10 


18 


YTt 


YBt XLt 


XRt 


12 




TABLE 6: SECOND 


LIST 






TOP 


BOTTOM 


LEFT 


RIGHT 








EDGE 


EDGE 


EDGE 


EDGE 


1 


YTD 


YBD 


XLD 


XRD 


2 


YTC 


YBC 


XLC 


XRC 


3 


YTB 


YBB 


XLB 


XRB 


4 


YTA 


YBA 


XLA 


XRA 



30 

(E) Recognition of the character image specified by the contents of the second list 

It is noted that the position data stored in the second list, i.e. YTD, YBD, XLD, XRD YTA, YBA, XLA, 

35 XRA, indicates the positions of the rectangles in the image buffer 23, each of which surrounds the character 
images D, C, B, A stored in the image buffer 23. 

The control unit 20 starts the operation (E) by activating the recognition unit 31. The recognition unit 31 
is provided with decision trees for recognizing the character images, which have been well known in the art. 
The detail description of the decision trees, therefore, is not made in the specification. The recognition unit 

40 31 accesses the entry 1 of the second list in the table memory 30 to fetch the position data YTD, YBD, XLD 
and XRD which represent the position of the rectangle surrounding the character D. And, the recognition 
unit 31 fetches the image data in the image buffer 23 surrounded by the rectangle. The recognition unit 31 
recognizes the image of the character D by use of the decision tree and stores the results in an output 
buffer 32. Next, the recognition unit 31 fetches the position data YTC, YBC, XLC and XRC, of the entry 2 in 

45 the second list and performs the above operations to store the results of the recognition of the character C 
in the output buffer 32. The recognition unit 32 repeats the above operations until all position data stored in 
the second list have been used. The operations are shown as a block 308 in the Fig. 3. 

The control unit 20 detects the termination of the operations (E) and supplies the contents of the output 
buffer 32 to an output device. It is noted that the four characters of the first character row of the document 

so 1, shown in the Fig. 1A have been recognized. The control unit 20 determines the highest position of the 
top edge of the rectangles of the characters in the first list, i.e. the position of the top edge of the rectangle 
of the character r. Referring to the Fig. 1A, the top edge is located at the bit line 2B. The control unit 20 
knows that an upper storage area between the top bit line and the bit line 2B of the image buffer 23 is now 
available for storing the next document image. The control unit 20 activates the scanner 21 to store the 

55 above next document image into the upper storage area, and the control unit 20 modifies the addresses of 
the bit lines of the upper storage area to continue to the address 23B which is the last bit line of the initially 
stored document image. That is, the top bit line of the upper storage area is assigned with an address 
23B + 1 , the second bit line is assigned with an address 23B + 2, and so on, whereby the continuity of the 
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newly stored document image in the upper storage area to the initially stored image in the image buffer 23 
is maintained. 

When the image buffer 23 is filled with the new document image, the control unit 20 performs the 
operations (B) through (E). When the operation (E) is completed, the control unit 20 repeats the above 
5 operations until all characters in the document 1 have been recognized. 

At the completion of the recognition of each character row the control unit 20 supplies the output 
device, such as a printer, a display device, etc. with the contents of the output buffer 32, whereby the 
operator could have the results of the character recognition of the document 1 . 

Although the invention has been described by using the document 1 skewed or inclined in the 
w counterclockwise direction, it is apparent that the invention could recognizes the characters of the document 
which is skewed in the clockwise direction. 

In the embodiment, the segmentation of the characters and symbols is described for the document with 
the characters in the words are spelled in the direction from left to right. The invention is capable of 
segmenting the characters and symbols spelled in an opposite direction, i.e. from right to left by arranging 
T5 the pointers in the first list in the order from the right most rectangle to the left most rectangle in the 
character row by arranging the area 53 of the Fig. 5A on the left side of the rectangle 51. 

Using the present invention for recognizing the document including the three kinds of fonts, i.e. Courier 
10, Courier 12 and Prestige Elite 12 with single vertical space between character rows it has been found 
that the characters of the document skewed by the maximum skew angle of 6 in the clockwise or 
20 counterclockwise direction have been recognized with clear separation of the character rows, in other words, 
without appearing the characters of the second character row shown in the Fig. 1A into the first and third 
character rows. 

Claims 

25 

1. A method of recognising characters printed on a document, the characters being arranged on the 
document in character rows, the method comprising the steps of: 
storing an image of the document in image storage means; 

determining the XY coordinate positions of rectangles in the stored image that are parallel to the X and 
30 Y axes, each rectangle defining the outer boundary of the image of a character and being formed by 
detecting the continuity of a character image over successive bit lines of the image; 
forming a first list of the rectangle position data in which the rectangles are arranged in order along the 
X axis of the stored image according to their X coordinate position data; 

determining the identity of a plurality of rectangles as being within a particular character row by 
35 detecting whether the position of a predefined point on each of these rectangles falls within a 
predetermined area defined with respect to another one of these rectangles in the first list and based 
on the position data of the identified rectangles calculating the angle at which the character row is 
skewed with respect to the XY coordinates of the image storage means; 

based on the calculated skew angle, determining all the constituent rectangles of each character row; 
40 and rearranging the rectangle position data into a second list according to the membership to a 
character row; 

supplying from the image storage means to a character recognition means the image data within each 
of the rectangles in the order specified in the second list. 

45 2. A method as claimed in Claim 1, wherein the step of determining the positions of the rectangles 
includes a step of determining whether each rectangle has a size which falls into a predetermined size 
range for the characters to be recognised and during the step of forming a list of position data, ignoring 
those rectangles which fall outside the range. 

so 3. A method as claimed in Claim 1, wherein the position data represents the positions of top, bottom, left 
and right edges of the rectangle in the XY coordinates. 

4. A method as claimed in Claim 1, wherein the step of determining the identity of a plurality of rectangles 
within a particular character row includes a step of searching the first list to detect a predetermined 
55 number of rectangles, with a bottom left corner of one rectangle being located within a predetermined 
distance in the X and Y axes from a bottom left corner of the preceding rectangle, and a step of 
performing calculations of the positions of bottom left corners of the detected rectangles to detect the 
skew of the character row and hence of the document. 
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A method as claimed in Claim 1, wherein the step of determining all the constituent rectangles in each 
character row includes the steps of: 

sequentially reading the position data of the rectangles in the first list in the arranged order; 

generating a virtual position of each rectangle by correcting the position in the Y axis of each 
rectangle In the first list by the skew; 

storing in a register the virtual position of the rectangle located at the highest position on the 
document; 

sequentially reading again the position data of the rectangles in the first list; 

generating the virtual position of each rectangle and determining whether the generated virtual 
position falls into a predetermined range from the virtual position stored in the register; and 

transferring the position data of a rectangle in the first list, the generated virtual position of which 
falls into the predetermined range, to the second list. 

Character recognition apparatus for recognising characters printed on a document comprising: 
image storage means (23) for storing an image of the document; 

means (24) for determining the XY coordinate positions of rectangles in the stored image that are 
parallel to the X and Y axes, each rectangle defining the outer boundary of the image of a character 
and being formed by detecting the continuity of a character image over successive bit lines of the 
image; 

means for forming a first list of the rectangle position data in which the rectangles are arranged in order 

along the X axis of the stored image according to their X coordinate position data; 

skew calculation means (26) for determining the identity of a plurality of rectangles as being within a 

particular character row by detecting whether the position of a predefined point on each of these 

rectangles falls within predetermined area defined with respect to another one of these rectangles in the 

first list and based on the position data of the identified rectangles calculating the angle at which the 

character row is skewed with respect to the XY coordinates of the image storage means; 

means (27) for determining all the constituent rectangles of each character row based on the calculated 

skew angle; and rearranging the rectangle position data into a second list (30) according to the 

membership to a character row; 

character recognition means (31) for fetching from the image storage means the image data within each 
of the rectangles in the order specified in the second list. 

Character recognition apparatus as claimed in Claim 6, wherein the means for determining the positions 
of rectangles determines as to whether a size of each rectangle falls within a predetermined size range 
of the characters to be recognised, and generates the position data only for rectangles falling into the 
size range. 

Character recognition apparatus as claimed in Claim 6, wherein the position data represents upper, 
bottom, left and right edges of the rectangles in XY coordinates of the image storage means. 

Character recognition apparatus as claimed in Claim 7, wherein the first list is stored in a first table 
memory; and 

wherein the skew calculation means sequentially reads the position data of the rectangles of the 
first list in the arranged order, detects a predetermined number of rectangles, in which a bottom left 
corner of one rectangle is located within predetermined distances in the X and Y directions from a 
bottom left corner of the preceding rectangle and detects a skew of the document based upon the 
positions of bottom left corners of the detected rectangles, and wherein the means for determining all 
the constituent rectangles of each character row, sequentially reads again the position data of the 
rectangles of the first list in the arranged order to generate a virtual position of each rectangle which is 
a Y axis position corrected by the skew, storing a virtual position located at a highest position of the 
document into a register, sequentially reads again the position data of the first list in the arranged order 
to generate the virtual positions of the rectangles, determines as to whether the generated virtual 
positions falls into a predetermined range from the virtual position in the register, and transfers the 
position data of rectangle having the virtual position falling into the predetermined range to a second 
list. 
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Patentanspruche 

1. Eine Methode zur Erkennung von gedruckten Zeichen auf einem Beleg, auf dem die Zeichen auf dem 
Beleg in Zeichenreihen angeordnet sind, das Verfahren mit Schritten enthaltend 

5 Speicherung eines Bildes des Belegs in Bildspeichermitteln; 

Bestimmung der XY-Koordinatenpositionen der Rechtecke in dem gespeicherten Bild, die parallel zu x- 
und y-Achsen sind, jedes Rechteck die auBere Grenze des Bildes eines Zeichens definiert und durch 
Erkennung der Kontinuitat eines Zeichenbildes uber die aufeinanderfolgenden Bitzeilen des Bildes 
gebildet werden; 

70 Bildung einer ersten Liste der Position sdaten der Rechtecke, in denen die Rechtecke in der Reihenfol- 
ge entlang der x-Achse des gespeicherten Bildes gemaB ihrer X-Koordinatenpositionsdaten angeordnet 
sind; 

Bestimmung der Identitat einer Vielzahl von Rechtecken, die innerhalb einer besonderen Zeichenreihe 
durch Feststellung erkannt werden t ob die Position eines zuvor bestimmten Punktes auf jedem dieser 
75 Rechtecke innerhalb eines zuvor bestimmten Bereichs liegt, der, bezogen auf ein anderes von diesen 
Rechtecken, in der ersten Liste definiert wurde und auf den Positionsdaten von den identifizierten 
Rechtecken basiert, Berechnen des Winkeis bei dem die Zeichenreihe schraggestellt wird, bezogen auf 
die XY-Koordinaten der Bildspeichermittel; 

basierend auf dem berechneten Schragstellungswinkel; und erneute Anordnung der Positionsdaten in 
20 einer zweiten Liste gemaB der Zugehorigkeit zu einer Zeichenreihe; 

Lieferung von den Bildspeichermitteln an ein Zeichenerkennungsmittel der Bilddaten innerhalb jeder 
von den Rechtecken in der in der zweiten Listen spezifizierten Reihenfolge. 

2. Eine Methode wie in Anspruch 1 , wobei der Schritt zur Festiegung der Positionen der Rechtecke einen 
25 Schritt zur Festiegung enthalt, ob jedes Rechteck eine Gr6Be hat, die in einen zuvor bestimmten 

GroBenbereich fur die zu erkennenden Zeichen fallt und wahrend des Schritts zur Bildung einer Liste 
mit Positionsdaten gebildet wird, diese Rechtecke zu ignorieren, die auBerhalb des Bereichs liegen. 

3. Eine Methode wie in Anspruch 1 , wobei die Positionsdaten die Positionen von oberen, unteren, linken 
30 und rechten Kanten der Rechtecke in den XY-Koordinaten darstellen. 

4. Eine Methode wie in Anspruch 1 , wobei der Schritt zur Bestimmung der Identitat von einer Vielzahl von 
Rechtecken innerhalb einer besonderen Zeichenreihe einen Schritt zum Suchen enthalten, urn eine 
zuvor bestimmte Anzahl von Rechtecken in der ersten Liste zu erkennen, mit einer unteren linken Ecke 

35 von einem Rechteck, das innerhalb einer zuvor bestimmten Entfernung in der x- und y-Achse von einer 
unteren linken Ecke des vorherigen Rechtecks liegt und einen Schritt zur Durchfuhrung von Berechnun- 
gen der Positionsdaten von unteren linken Ecken der erkannten Rechtecke, urn die Schragstellung der 
Zeichenreihe und somit des Belegs zu erkennen. 

40 5. Ein Verfahren wie in Anspruch 1, wobei der Schritt zur Festiegung der . Rechtecke, aus denen jede 
Zeichenreihe gebildet wird, Schritte enthalt 

zum schrittweisen Lesen der Positionsdaten der Rechtecke in der ersten Liste in der angeordneten 
Reihenfolge; 

zur Erzeugung einer virtuellen Position von jedem Rechteck durch Korrektur der Position in der y- 
45 Achse von jedem Rechteck in der ersten Liste durch die Schragstellung; 

zum Speichern in einem Register der virtuellen Position des Rechtecks, das sich in der hSchsten 
Position des Belegs befindet; 

zum erneuten schrittweise Lesen der Positionsdaten der Rechtecke in der ersten Liste; 
zur Erzeugung der virtuellen Position von jedem Rechteck und Festiegung, ob die erzeugte virtuelle 
50 Position in einen zuvor bestimmten Bereich von der in einem Register gespeicherten virtuellen Position 
fallt; und 

zur Ubertragung der Positionsdaten eines Rechtecks in die erste Liste, der erzeugten virtuellen 
Position, welche in den zuvor bestimmten Bereich fallt, in die zweite Liste. 

55 6. Gerat zur Zeichenerkennung zur Erkennung von gedruckten Zeichen auf einem Beleg mit: 
Bildspeichermitteln (23) zur Speicherung eines Bildes von dem Beleg; 

Mittel (24) zur Festiegung der XY-Koordinatenpositionen in dem gespeicherten Bild, die parallel zu den 
x- und y-Achsen sind, jedes Rechteck die auBere Grenze des Bildes eines Zeichens definiert und 



16 



EP 0 287 027 B1 



durch Erkennung der Kontinuitat eines Zeichenbildes iiber die aufeinanderfolgenden Bitzeilen des 
Bildes gebildet werden; 

Mittel zur Bildung einer ersten Liste der Positionsdaten der Rechtecke, in denen die Rechtecke in der 
Reihenfolge entlang der x-Achse des gespeicherten Bildes gemaB ihrer X-Koordinatenpositionsdaten 
5 angeordnet sind; 

Mittel zur Schragstellungsberechnung (26) zur Bestimmung der Identitat einer Vielzahl von Rechtecken, 
die innerhalb einer besonderen Zeichenreihe durch Feststellung erkannt werden, ob die Position eines 
zuvor bestimmten Punkte's auf jedem dieser Rechtecke innerhalb eines zuvor bestimmten Bereichs 
liegt, der, bezogen auf ein anderes von diesen Rechtecken, in der ersten Liste definiert wurde und auf 

10 den Positionsdaten von den identifizierten Rechtecken basiert, Berechnen des Winkels bei dem die 
Zeichenreihe schraggestellt wird, bezogen auf die XY-Koordinaten der Bildspeichermittel; 
Mittel (27) zur Feststellung alter Rechtecke, aus denen jede Zeichenreihe gebildet wird, basierend auf 
dem berechneten Schragstellungswinkel; und erneute Anordnung der Positionsdaten in einer zweiten 
Liste gemaB der Zugehorigkeit zu einer Zeichenreihe; 

75 Mittel zur Zeichenerkennung (31), um von den Bildspeichermitteln die Bilddaten innerhalb jeder der 
Rechtecke in der Reihenfolge abzurufen, die in der zweiten Liste angegeben sind. 

7. Gerat zu Zeichenerkennung wie in Anspruch 6, wobei die Mittel zur Feststellung der Positionen von 
den Rechtecken bestimmen, ob eine GroBe von jedem Rechteck innerhalb eines zuvor bestimmten 

20 GroBenbereichs der zu erkennenden Zeichen liegt und Positionsdaten nur fiir Rechtecke erzeugt, die in 
den GrQBenbereich fallen. 

8. Gerat zur Erkennung von Zeichen wie in Anspruch 6, wobei die Positionsdaten obere, untere, linke und 
rechte Kanten der Rechtecke in den XY-Koordinaten der Bildspeichermittel reprasentieren. 

25 

9. Gerat zur Erkennung von Zeichen wie in Anspruch 7, wobei die erste Liste in einem ersten Tabellen- 
speicher gespeichert wird; und 

wobei die Mittel zur Schragstellungsberechnung die Positionsdaten der Rechtecke der ersten Liste in 
der angeordneten Reihenfolge lesen, eine zuvor bestimmte Anzahl von Rechtecken erkennen, in 

30 welchen eine untere linke Ecke des einen Rechtecks innerhalb der zuvor bestimmten Entfernungen in 
den x- und y-Richtungen von einer unteren linken Ecke des vorhergehenden Rechtecks liegt und 
Erkennen einer Schragstellung des Belegs, der auf den Positionen der unteren linken Ecke der 
erkannten Rechtecke basiert und wobei die Mittel zur Festlegung aller Rechtecke, aus denen jede 
Zeichenreihe gebildet wird, schrittweise erneut die Positionsdaten der Rechtecke von der ersten Liste in 

35 der angeordneten Reihenfolge zu lesen, um eine virtuelle Position von jedem Rechteck zu erzeugen, 
das eine durch die Schragstellung korrigierte y-Achsen Position hat, Speicherung einer virtuellen 
Position, die sich in der hochsten Position des Belegs in einem Register befindet, erneut schrittweises 
Lesen der Positionsdaten der ersten Liste in der angeordneten Reihenfolge, um die virtuellen Positio- 
nen der Rechtecke zu erzeugen, festzulegen, ob die erzeugten, virtuellen Positionen in einen zuvor 

40 bestimmten Bereich von der virtuellen Position in dem Register fallen und Obertragen der Positionsda- 
ten des Rechtecks, deren virtuelle Position in den zuvor bestimmten Bereich einer zweiten Liste fallen. 

Revendications 

45 1. Procdde* de reconnaissance de caracteres imprimis sur un document, les caracteres etant agenc6s sur 
le document en rang6es de caracteres, le proce'de* comprenant les Stapes de: 

emmagasiner une image du document dans des moyens d'emmagasinage d'image; 
determiner dans I'image emmagasinSe les positions de coordonn^es XY de rectangles qui sont 
paraileles aux axes X et Y, chaque rectangle deTmissant la limite exteVieure de I'image d'un caractere et 
so 6tant forme* en dStectant la continuity d'une image de caractere sur des lignes binaires successives de 
I'image; 

former une premiere liste des donndes de position de rectangle dans laquelle les rectangles sont 
agenc£s en ordre le long de I'axe X de 1'image emmagasin6e suivant leurs donn6es de position de 
coordonn£es d'axe X; 

55 determiner ridentite* d'une plurality de rectangles comme 6tant dans une ranged de caracteres 

particuliere en ddtectant si la position d'un point pr6d6fini sur chacun de ces rectangles tombe dans 
une zone predetermined ddfinie par rapport a un autre de ces rectangles dans la premiere liste et en 
prenant comme base les donn£es de position des rectangles identifies calculant Tangle d'inclinaison de 
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la rangee de caracteres fait avec les coordonnees XY des moyens d'emmagasinage d'image; 

en prenant comme base Tangle d'inclinaison calcule, determiner tous les rectangles constitutifs de 
chaque rangee de caracteres, et re-agencer les donnees de position de rectangle en une deuxfeme 
liste en qualite de membre d'une rangee de caracteres; 
5 fournir a partir des moyens d'emmagasinage d'image sur des moyens de reconnaissance de 

caracteres, les donnees d'image dans chacun des rectangles dans I'ordre specifie dans la deuxieme 
liste. 

2. Proced£ selon la revendication 1, dans lequel I'etape de determiner la position des rectangles, 
w comprend une etape pour determiner si chaque rectangle a une taille qui tombe dans une gamme de 

taiiles predeterminee pour les caracteres a reconnaTtre et, durant I'etape de former une liste de 
donnees de position, ne pas tenir compte des rectangles qui tombent hors de la gamme. 

3. Precede selon la revendication 1 , dans lequel les donnees de position represented la position de bords 
75 supSrieur, inferieur, gauche et droit du rectangle dans les coordonnees XY. 

4. Proc£d6 selon la revendication 1, dans lequel I'etape de determiner Tidentite d'une plurality de 
rectangles dans une rangee de caracteres particuli^re, comprend une etape de recherche de la 
premiere liste pour detecter un nombre predetermine de rectangles, un coin gauche inferieur d'un 

20 rectangle etant place a Tinterieur d'un distance predeterminee dans les axes X et Y a partir d'un coin 
gauche inferieur du rectangle precedent, et une etape de calculs de la position des coins inferieurs 
gauches des rectangles detectes pour detecter Tinclinaison de la rangee de caracteres, et, partant, du 
document. 

25 5. Procede selon la revendication 1, dans lequel I'etape de determiner tous les rectangles constitutifs 
dans chaque rangee de caracteres, comprend les etapes de: 

lire successivement les donnees de position des rectangles dans la premiere liste dans I'ordre 
agence; 

engendrer une position virtuelle de chaque rectangle en corrigeant la position dans I'axe Y de 
30 chaque rectangle dans la premiere liste, par Tinclinaison; 

emmagasiner dans un registre la position virtuelle du rectangle place h la position supdrieure sur le 
document; 

lire a nouveau successivement les donnees de position des rectangles dans la premiere liste; 
engendrer la position virtuelle de chaque rectangle et determiner si la position virtuelle engendree 
35 tombe dans une gamme predeterminee a partir de la position virtuelle emmagasinee dans le registre; 
et 

transferer les donnees de position d'un rectangle dans la premiere liste dont la position virtuelle 
engendree tombe dans la gamme predeterminee, sur la deuxi&me liste. 

40 6. Appareil de reconnaissance de caracteres pour reconnaltre des caracteres imprimes sur un document, 
comprenant: 

des moyens d'emmagasinage d'image (23) pour emmagasiner une image du document; 

des moyens (24) pour determiner dans I'image emmagasinee, les positions de coordonnees XY de 
rectangles qui sont paralteles aux axes X et Y, chaque rectangle detinissant la limite exterieure de 
45 Timage d'un caractere et etant forme en detectant la continuite d'une image de caractere sur des lignes 
binaires successives de I'image; 

des moyens pour former une premiere liste des donnees de position de rectangle dans laquelle les 
rectangles sont agences en ordre le long de I'axe X de I'image emmagasin6e suivant leurs donnees de 
position de coordonnees d'axe X. 
50 des moyens de calcul d'inclinaison (26)pour determiner I'identite d'une plurality de rectangles 

comme etant dans une rangee de caracteres particuiiere en detectant si la position d'un point predefine 
sur chacun de ces rectangles tombe dans une zone predeterminee detinie par rapport a un autre de 
ces rectangles dans la premiere liste et en prenant comme base les donnees de position des 
rectangles identifies calculant Tangle d'inclinaison de la rangee de caracteres fait avec les coordonnees 
55 XY des moyens d'emmagasinage d'image; 

des moyens (27) pour determiner tous les rectangles constitutifs de chaque rangee de caracteres 
en prenant comme base Tangle d'inclinaison calcuie, et pour re-agencer les donnees de position de 
rectangle en une deuxieme liste (30) en qualite de membre d'une rangee de caracteres; 
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des moyens de reconnaissance de caracteres (31) pour extraire des moyens d'emmagasinage 
damage, les donnees damage dans chacun des rectangles dans I'ordre specifie dans ta deuxieme liste. 

Appareil de reconnaissance de caracteres selon la revendication 6, dans lequel les moyens pour 
determiner la positions des rectangles determined si une taille de chaque rectangle tombe dans une 
gamme de tailles predetermine des caracteres a reconnattre, et engendrent les donnees de position 
uniquement pour les rectangles tombant dans la gamme de tailles. 

Appareil de reconnaissance de caracteres selon la revendication 6, dans lequel les donnees de position 
represented les bords superieur, interieur, gauche et droit des rectangles dans les coordonnSes XY 
des moyens d'emmagasinage d'image. 

Appareil de reconnaissance de caracteres selon la revendication 7, dans lequel ia premiere liste est 
emmagasinSe dans une premiere mSmoire de tables; et 

dans lequel les moyens de calcul d'inclinaison lisent successivement les donnees de position des 
rectangles de la premiere liste dans I'ordre agence, detected un nombre predetermine de rectangles, 
dans lequel un coin gauche inf^rieur d'un rectangle est place a Pinterieur des distances predeterminees 
dans les directions X et Y a partir du coin gauche interieur du rectangle precedent, et detectent une 
inclinaison du document basee sur la position du coin gauche interieur des rectangles detectes, et dans 
lequel les moyens pour determiner tous les rectangles constitutifs de chaque rangee de caracteres, 
lisent successivement a nouveau les donnees de position des rectangles de la premiere liste dans 
I'ordre agence pour engendrer une position virtuelle de chaque rectangle qui est une position d'axe Y 
corrigee par Pinclinaison, emmagasinant une position virtuelle placee a la position la plus haute du 
document dans un registre, lisent successivement a nouveau les donnees de position de la premiere 
liste dans I'ordre agence pour engendrer la position virtuelle des rectangles, determined si les 
positions virtuelles engendrees tombent dans une gamme predeterminee a partir de la position virtuelle 
dans le registre, et transferent les donnees de position de rectangle dont la position virtuelle tombe 
dans la gamme predeterminee, sur une deuxieme liste. 
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